Tractable Algorithms for Proximity Search on Large Graphs

نویسندگان

  • Purnamrita Sarkar
  • Geoffrey J. Gordon
  • Anupam Gupta
  • Jon Kleinberg
چکیده

Identifying the nearest neighbors of a node in a graph is a key ingredient in a diverse set of ranking problems, e.g. friend suggestion in social networks, keyword search in databases, web-spam detection etc. For finding these “near” neighbors, we need graph theoretic measures of similarity or proximity. Most popular graph-based similarity measures, e.g. length of shortest path, the number of common neighbors etc., look at the paths between two nodes in a graph. One such class of similarity measures arise from random walks. In the context of using these measures, we identify and address two important problems. First, we note that, while random walk based measures are useful, they are often hard to compute. Hence we focus on designing tractable algorithms for faster and better ranking using random walk based proximity measures in large graphs. Second, we theoretically justify why path-based similarity measures work so well in practice. For the first problem, we focus on improving the quality and speed of nearest neighbor search in real-world graphs. This work consists of three main components: first we present an algorithmic framework for computing nearest neighbors in truncated hitting and commute times, which are proximity measures based on short term random walks. Second, we improve upon this ranking by incorporating user feedback, which can counteract ambiguities in queries and data. Third, we address the problem of nearest neighbor search when the underlying graph is too large to fit in main memory. We also prove a number of interesting theoretical properties of these measures, which have been key to designing most of the algorithms in this thesis. We address the second problem by bringing together a well known generative model for link formation, and geometric intuitions. As a measure of the quality of ranking, we examine link prediction, which has been the primary tool for evaluating the algorithms in this thesis. Link prediction has been extensively studied in prior empirical surveys. Our work helps us better understand some common trends in the predictive performance of different measures seen across these empirical results.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

META-HEURISTIC ALGORITHMS FOR MINIMIZING THE NUMBER OF CROSSING OF COMPLETE GRAPHS AND COMPLETE BIPARTITE GRAPHS

The minimum crossing number problem is among the oldest and most fundamental problems arising in the area of automatic graph drawing. In this paper, eight population-based meta-heuristic algorithms are utilized to tackle the minimum crossing number problem for two special types of graphs, namely complete graphs and complete bipartite graphs. A 2-page book drawing representation is employed for ...

متن کامل

Fast Algorithms for Proximity Search on Large Graphs

The main focus of this proposal is on understanding and analyzing entity relationships in large social networks. The broad range of applications of graph based learning problems includes collaborative filtering in recommender networks, link prediction in social networks (e.g. predicting future links from the current snapshot of a graph), fraud detection and personalized graph search. In all the...

متن کامل

Reverse Top-k Search using Random Walk with Restart

With the increasing popularity of social networks, large volumes of graph data are becoming available. Large graphs are also derived by structure extraction from relational, text, or scientific data (e.g., relational tuple networks, citation graphs, ontology networks, protein-protein interaction graphs). Node-to-node proximity is the key building block for many graph-based applications that sea...

متن کامل

A Solution Merging Heuristic for the Steiner Problem in Graphs Using Tree Decompositions

Fixed parameter tractable algorithms for bounded treewidth are known to exist for a wide class of graph optimization problems. While most research in this area has been focused on exact algorithms, it is hard to find decompositions of treewidth sufficiently small to make these algorithms fast enough for practical use. Consequently, tree decomposition based algorithms have limited applicability ...

متن کامل

Keyword Proximity Search on XML Graphs

XKeyword provides efficient keyword proximity queries on large XML graph databases. A query is simply a list of keywords and does not require any schema or query language knowledge for its formulation. XKeyword is built on a relational database and, hence, can accommodate very large graphs. Query evaluation is optimized by using the graph’s schema. In particular, XKeyword consists of two stages...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010